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Abstract 

The base-fc Copeland-Erdos sequence given by an infinite set A of positive integers is the 
infinite sequence CE^ (A) formed by concatenating the base-fc representations of the elements of 
A in numerical order. This paper concerns the following four quantities. 

• The finite-state dimension dimps(CEfc(A)), a finite-state version of classical Hausdorff 
dimension introduced in 2001. 

• The finite-state strong dimension Dimps(CEfc(A)), a finite-state version of classical packing 
dimension introduced in 2004. This is a dual of dimps(CEfc(A)) satisfying DimFs(CEfc(A)) 
> dim FS (CE fe (A)). 

• The zeta- dimension Dim^(A), a kind of discrete fractal dimension discovered many times 
over the past few decades. 

• The lower zeta- dimension dim^(A), a dual of Dim^(A) satisfying dirrif (A) < Dim^(A). 
We prove the following. 

1. dimFs(CEfc(A)) > dim^(A). This extends the 1946 proof by Copeland and Erdos that the 
sequence CE^ (PRIMES) is Borel normal. 

2. Dim FS (CE fe (A)) > Dim c (A). 

3. These bounds are tight in the strong sense that these four quantities can have (simultane- 
ously) any four values in [0, 1] satisfying the four above-mentioned inequalities. 

1 Introduction 



In the early years of the twenty-first century, two quantities have emerged as robust, well-behaved, 
asymptotic measures of the finite-state information content of a given sequence S over a finite al- 
phabet S. These two quantities, the finite-state dimension dimps('S') and the finite-state strong 
dimension Dimps(S) (defined precisely in section |3J), are duals of one another satisfying < 
dimFs('S') < DimFslS 1 ) < 1 for all S. They are mathematically well-behaved, because they are 
natural effectivizations of the two most important notions of fractal dimension. Specifically, finite- 
state dimension is a finite-state version of classical Hausdorff dimension introduced by Dai, Lathrop, 
Lutz, and Mayordomo ^0], while finite-state strong dimension is a finite-state version of classical 
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packing dimension introduced by Athreya, Hitchcock, Lutz, and Mayordomo Both finite-state 
dimensions, diniFs(S') and DiniFs(S'), are robust in that each has been exactly characterized in terms 
of finite-state gamblers |10l |3] , information-lossless finite-state compressors |101 |3] , block-entropy 
rates :5 a , and finite-state predictors in the log- loss model |14l l3]. In each case, the characterizations 
of dimFs(S') and DimFs(5') are exactly dual, differing only in that a limit inferior appears in one 
characterization where a limit superior appears in the other. Hence, whether we think of finite-state 
information in terms of gambling, data compression, block entropy, or prediction, dimps^) and 
Dimps('S') are the lower and upper asymptotic information contents of S, as perceived by finite-state 
automata. 

For any of the dimensions mentioned above, whether classical or finite-state, calculating the 
dimension of a particular object usually involves separate upper and lower bound arguments, with 
the lower bound typically more difficult. For example, establishing that dimps^) = ol for some 
particular sequence S and a G (0, 1) usually involves separate proofs that a is an upper bound 
and a lower bound for dimps(S'). The upper bound argument, usually carried out by exhibiting a 
particular finite-state gambler (or predictor, or compressor) that performs well on S, is typically 
straightforward. On the other hand, the lower bound argument, proving that no finite-state gambler 
(or predictor, or compressor) can perform better on S, is typically more involved. 

This paper exhibits and analyzes a flexible method for constructing sequences satisfying given 
lower bounds on dimps(S') and/or Dimps(S'). The method is directly motivated by work in the 
first half of the twentieth century on Borel normal numbers. We now review the relevant aspects 
of this work. 

In 1909, Borel @j defined a sequence S over a finite alphabet £ to be normal if, for every string 
w e E+ 

lim ~\{i<n | SH.A + \w\ - 1] = w}\ = lEl -1 ™ 1 , 

n— >oc n 

where S[i..j] is the string consisting of the ith through jth. symbols in S. That is, S is normal (now 
also called Borel normal) if all the strings of each length appear equally often, asymptotically, in 
S. (Note: Borel was interested in numbers, not sequences, and defined a real number to be normal 
in base k if its base-fc expansion is normal in the above sense. Subsequent authors mentioned here 
also stated their results in terms of real numbers, but we systematically restate their work in terms 
of sequences.) 

The first explicit example of a normal sequence was produced in 1933 by Champernowne [7], 
who proved that the sequence 

S= 123456789101112 •• • , (1.1) 

formed by concatenating the decimal expansions of the positive integers in order, is normal over the 
alphabet of decimal digits. Of course there is nothing special about decimal here, i.e., Champer- 
nowne's argument proves that, for any k > 2, the sequence (now called the base-A Champernowne 
sequence) formed by concatenating the base-A: expansions of the positive integers in order is normal 
over the alphabet = {0, 1, . . . , k — 1}. 

Champernowne [Jj conjectured that the sequence 

S = 235711131719232931 • • • , (1.2) 

formed by concatenating the decimal expansions of the prime numbers in order, is also normal. 
Copeland and Erdds [S] proved this conjecture in 1946, and it is the method of their proof that is 
of interest here. Given an infinite set A of positive integers and an integer k > 2, define the base-k 
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Copeland-Erdds sequence of A to be the sequence CEk(A) over the alphabet = {0, 1, . . . , k — 1} 
formed by concatenating the base-fc expansions of the elements of A in order. The sequences 
(fTTTft and (JOJ) are thus CE 10 (Z+) and CEi (PRIMES), respectively, where Z+ is the set of all 
positive integers and PRIMES is the set of prime numbers. Say that a set A C Z + satisfies the 
Copeland-Erdds hypothesis if, for every real number a < 1, for all sufficiently large n E Z + , 

|An{l,2,...,n}| >n a . 

Copeland and Erdos |Hj proved that every set A C Z + satisfying the Copeland-Erdos hypothesis 
has the property that, for every k > 2, the sequence CEfc(^4) is normal over the alphabet The 
normality of the sequence (|1.2j) - and of all the sequences CE& (PRIMES) - follows immediately by 
the Prime Number Theorem El > which says that 

(PRIMES n{l, 2,..., n}| Inn 
hm = 1, 

n— »oo n 

whence PRIMES certainly satisfies the Copeland-Erdos hypothesis. 

The significance of the Copeland-Erdos result for finite-state dimension lies in the fact that the 
Borel normal sequences are known to be precisely those sequences that have finite-state dimension 1 
[TBI 15]. The Copeland-Erdos result thus says that the sequences CE^(A) have finite-state dimension 
1, provided only that A is "sufficiently dense" (i.e., satisfies the Copeland-Erdos hypothesis). 

In this paper, we generalize the Copeland-Erdos result by showing that a parametrized version 
of the Copeland-Erdos hypothesis for A gives lower bounds on the finite-state dimension of CE^(A) 
that vary continuously with - in fact, coincide with - the parameter. The parametrization that 
achieves this is a quantitative measure of the asymptotic density of A that has been discovered 
several times by researchers in various areas over the past few decades. Specifically, define the 
zeta- dimension of a set A C Z + to be 

Dim c (^4) = inf {s \ (a(s) < oo} , 

where the A-zeta function Qa '■ [0, oo) — > [0, oo] is defined by 



Us) = £ 



n~ s . 



It is easy to see (and was proven by Cahen [S] in 1894; see also [UE3) that zeta-dimension admits 
the "entropy characterization" 



\og\A n {!,. .., n}\ 
logn 



Dim c (A) = limsup &l \}_" ' ■ (1.3) 



It is then natural to define the lower zeta-dimension of A to be 

dim c (^) = liminf iogl^n{i,...,^ (L4) 

rwoo log n 

Various properties of zeta-dimension and lower zeta-dimension, along with extensive historical 
references, appear in the recent paper [TT|, but none of this material is needed to follow our 
technical arguments in the present paper. 
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It is evident that a set A C Z + satisfies the Copeland-Erdos hypothesis if and only if dim^(j4) = 
1. The Copeland-Erdos result thus says that, for all infinite A C Z + and k > 2, 

dim c (A) = 1 => dim FS (CE fc (A)) = 1. (1.5) 

Our main theorem extends (|1.5I) by showing that, for all infinite A C Z + and k > 2, 

dim FS (CE fc (A)) > dim^ (A), (1.6) 

and, dually, 

Dim FS (CE A; (A)) > Dim c (A). (1.7) 

Moreover, these bounds are tight in the following strong sense. Let A C Z + be infinite, let k > 2, 
and let a = din^(A), = Dim c (yl), 7 = dim FS (CE fc (A)), 5 = Dim FS (CE fc (A)). Then, by (Ol) . 
p.7jl . and elementary properties of these dimensions, we must have the inequalities 

7 < 5 < 1 

VI VI (1.8) 
< a < 0. 

Our main theorem also shows that, for any a, (3, 7, 5 satisfying Q1.8[) and any k > 2, there 
is an infinite set A C Z + such that dim^(A) = a, Dim^(A) = /3, dim F s(CEfc(A)) = 7, and 
Dim F g(CEfc(A)) = 5. Thus the inequalities 

dim FS (CE fc (A)) < Dim FS (CE jfc (A)) < 1 

VI VI (1.9) 

< dim ? (A) < Dim c (A). 

are the only constraints that these four quantities obey in general. 

The rest of this paper is organized as follows. Section |2] presents basic notation and terminology. 
Section|31reviews the definitions of finite-state dimension and finite-state strong dimension and gives 
useful characterizations of zeta-dimension and lower zeta-dimension. Section presents our main 
theorem. 



2 Preliminaries 

We write Z + = {1, 2, . . . } for the set of positive integers. For an infinite set A C Z + , we often write 
A = {ai < a>2 < ■ ■ ■ } to indicate that 01, 02, . . . is an enumeration of A in increasing numerical 
order. The quantifier 3°°n means "there exist infinitely many n G Z + such that . . . " , while the 
dual quantifier V°°n means "for all but finitely many n £ Z + , . . . ". 

We work in the alphabets = {0, 1, . . . , k — 1} for k > 2. The set of all (finite) strings over 
Sfc is E£, and the set of all (infinite) sequences over is We write A for the empty string. 
Given a sequence S E and integers < i < j, we write S[i..j] for the string consisting of the ith 
through jth symbols in S. In particular, S[0..n — 1] is the string consisting of the first n symbols 
of S. We write w Q z to indicate that the string w is a prefix of the string or sequence z. 

We use the notation A(Efc) for the set of all probability measures on i.e., all functions 
it : Sfc — > [0,1] satisfying £ ae s fc 7r(a) = 1. Identifying each probability measure it G A(Sfc) with 
the vector (tt(0), . . . , 7r(k — 1)) enables us to regard A(S^) as a closed simplex in the fc-dimensional 
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Euclidean space HL fc . We write Aq)(Efc) for the set of all rational-valued probability measures 
7r G A(Efc). It is often convenient to represent a positive probability measure tt G AQ(Efc) by a 
vector a = (ao, • • • , ctfc-i) of positive integers such that, for all i G E&, 7r(z) = 2i, where n = ^i=o a i- 
In this case, a is called a partition of n. When a represents 7r in this way, we write it = ^. 
The /c-ary Shannon entropy [5] of a probability measure n G A(Efc) is 

1 fc_1 1 
Hfc(7r) = log fc — - = V 7r(i) log fc —-, 

7T(») £jj 7T(») 

where E„- denotes mathematical expectation relative to the probability measure ir and we stipulate 
that 01og fc i = 0, so that 7i k is continuous on the simplex A(Efe). The /c-ary Kullback-Leibler 
divergence [S] between probability measures tt, t G A(Efc) is 

P fe (7T || r = E n log, = £ 7r(t) log, 

w i=o w 

It is well-known that £>fc(7r || r) > 0, with equality if and only if tt = r. 

For k > 2 and n G Z + , we write o~ k (n) for the standard base-A; representation of n. Note that 
<7fc(n) G E£ and that the length of (number of symbols in) a k (n) is |0",(n)| = 1 + [log fc rij . Note 
also that, if A = {a\ < a% < ■ ■ ■ } C Z + is infinite, then the base-A: Copeland-Erdos sequence of A 
is 

CE k (A) = a k (a 1 )a k (a 2 )---e E£°. 

Given a set A C Z + and k,n G Z + , we write A =n = {a £ A \ \ak{a)\ = n} in contexts where the 
base k is clear. 

We write log n for log 2 n. 

3 The Four Dimensions 

As promised in the introduction, this section gives precise definitions of finite-state dimension and 
finite-state strong dimension. It also gives a useful bound on the success of finite-state gamblers 
and useful characterizations of zeta-dimension and lower zeta-dimension. 

Definition. A finite-state gambler (FSG) is a 5-tuple 

G = (Q,E k ,8,P,q ), 



where Q is a nonempty, finite set of states; T, k 
5 : Q x E, — > Q is the transition function; (5 : Q 
the initial state. 



= {0, 1, . . . ,k — 1} is a finite alphabet (k > 2); 
— > A<Q(Efc) is the betting function; and go £ Q is 



Finite-state gamblers have been investigated by Schnorr and Stimm JH], Feder |12j . and others. 
The transition function 5 is extended in the standard way to a function 5 : Q x E£ — ► Q. For 
u> G E£, we use the abbreviation 5(w) = S(qo,w). 

Definition. (JO]). Let G = (Q, T, k , S, (3, qo) be an FSG, and let s G [0,oo). The s-gale of G is the 
function 

4 s) :E^[0,oo) 
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defined by the recursion 



dg(wa) = k s d>^{w)l3{5{w)){a) (3.1) 

for all hjGSj and a E 

Intuitively, (w) is the amount of money that the gambler G has after betting on the successive 
symbols in the string w. The parameter s controls the payoffs via equation ()3.1j) . If s = 1, then 
the payoffs are fair in the sense that the conditional expected value of d^ (ioa), given that w has 

occurred and the symbols a E are all equally likely to follow w, is precisely cIq\w). If s < 1, 
then the payoffs are unfair. 

We repeatedly use the obvious fact that Oq (w) < k s ^ holds for all s and w. 

Definition. Let G = (Q, E fc , S, /3, q ) be an FSG, let s E [0,oo), and let S E 

1. G s- succeeds on S if 

limsupd^ ) (S'[0..n - 1]) = oo. 

2. G strongly s-succeeds on S 1 if 

liminfc$ ) (S[0..n - 1]) = oo. 

n— >oo 

Definition. Let 5 E 

1. JU]. The finite-state dimension of <S is 

dimFs(S') = inf {s \ there is an FSG that s-succeeds on 5}. 

2. jSj The finite- state strong dimension of S 1 is 

DimFs(<5) = mf {s \ there is an FSG that strongly s-succeeds on 5} . 

It is easy to verify that < dimFs(S') < DimpslS) < 1 for all S E More properties of these 
finite-state dimensions, including their relationships to classical Hausdorff and packing dimensions, 
respectively, may be found in [TU1 

It is useful to have a measure of the size of a finite-state gambler. This size depends on the 
alphabet size, the number of states, and the least common denominator of the values of the betting 
function in the following way. 

Definition. The size of an FSG G = (Q, S, f3, qo) is 

size(G) = (k + l)\Q\, 
where I = mm{l E Z+ | (Vg E Q)(Vi E X k )l/3(q)(i) E Z}. 

Observation 3.1. For each k > 2 and t E Z + ; t/iere are, up to renaming of states, fewer than 
t 2 {2t) 1 finite-state gamblers G with size(G) < t. 
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Proof. Given k,l,m G Z + with k > 2, let Gk,i,m be the set of all FSGs G = (S m , <5, /3, go) 
satisfying lf3{q)(i) G Z for all g G S m and i 6 Sj;. Equivalently, Gk,l,m is the set of all FSGs 
G = (Q, Sfc, (5, /?, go) such that Q = {0, . . . , m — 1} and : Q — » Aq^S^), where 

A Q; (£ fc ) = {tt G A Q (£ fc ) | (V* G S fc )/vr(i) G Z} . 

Since |AQ ; (Sfc)| = (^l"^ 1 ), it is easy to see that 

\Gk,i, m \=m km+1 ( k l l _~ i y. (3.2) 

Now fix k > 2 and i G Z + , and let £/t be the set of all FSGs G = (S m , 5, (5, go) with 
size(G) < t. Our objective is to show that \Gt\ < t 2 (2t) t . For each 1 < j < t, there are at 
most j pairs (l,m) such that (k + Z)m = j, and, for each of these pairs (Z,m), (|3.2|) tells us that 

\Sk,l,m\ < ( 2 i) J > so 

\g t \ <^j(2j? <t\2t) t . 

□ 

In general, an s-gale is a function d : — ► [0,oo) satisfying 

fc-i 

d(u>) = k~ s ^2 d(wa) 

a=0 

for all w G ^5]. It is clear that is an s-gale for every FSG G and every s G [0, oo). The case 
k = 2 of the following lemma was proven in [T^]. The extension to arbitrary k > 2 is routine. 

Lemma 3.2. (HS|). If s G [0, 1] and <i is an s-gale, then, for all w G j G N, and < a £ 1, 

i/tere are fewer than ^- strings m£Ej of length j for which d{u) > a. 

The following lemma will be useful in proving our main theorem. 

Lemma 3.3. For each s, a G (0, oo) and k,n,t G Z + tmi/i k > 2, there are fewer than 

k 2s n s t 2 {2t) t 
a(k s - 1) 

integers m G {1, . . . , n} /or which 

max d^((Tfc(m)) > a, 

size(G)<t 

where the maximum is taken over all FSGs G = {Q, 5, (3, go) tozi/i size(G) < i. 

Proof. Let s, a, fc, re, and t be as given, and let Qt be the set of all FSGs G = (S m , 5, /?, go) with 
size(G) < i. For each j G Z + and G G Lemma 13.21 tells us that there are fewer than — strings 
u G X£ of length j for which Oq{u) > a. It follows by Observation 13. II that, for each j G Z + , there 
are fewer than i 2 (2i) 4 ^- strings u G of length j for which 

max dl^(u) > a 
G&g t 
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holds. Since 

k sj f^t^^nl k 2s n s t 2 {2t) t 

> t 2 (2t f — = — V k SJ < — K — '-, 

^ y ' a a ^ ~ a(k s -l) ' 

the lemma follows. □ 

The zeta-dimension Dim^(^4) and lower zeta-dimension dim^(^4) of a set A of positive integers 
were defined in the introduction. The following lemma gives useful characterizations of these 
quantities in terms of the increasing enumeration of A. 

Lemma 3.4. Let A = {a± < ai < ■ ■ ■ } be an infinite set of positive integers. 

1. dim c (A) = inf {t > | (3°°n)c4 > n) = inf {t > | (3°°n)a* > n] 

= sup {t > | (V°°ra)(4 < n } = sup {t > | (V°°n)a^ < n } . 

£ Dim ( (A) = inf {t > | (V°°n)a'„ > n} = inf {t > | (V°°n)a* > n) 

= sup {t > | (3°°n)a& < n } = sup {t > | (3°°n)af i < n } . 

Proof. Let A be as given. For each R £ {<,<,>,>}, define the sets 

Ir = {t >0 | {3™)ci Rn}, 
Jn = {t>0 | (V°°n)a'„ Rn} . 

Our task is then to prove that 

dim^(^4) = inf I> = inf I> = sup J< = sup J< (3-3) 

and 

Dim^(^4) = inf J> = inf J> = sup/< = sup/<. (3-4) 

Note that each of the pairs (J<, i>), (J<, i>), (/<, J>), (/<, J>) partitions [0, oo) into two nonempty 
subsets with every element of the left component less than every element of the right component, 
the left components satisfying 

€ J< C J< n /< C J< U I< C I<, 
and the right components satisfying 

(1, oo) C J> C J> n J> C J> U /> C I>. 
It follows immediately from this that 

sup J< = inf /> < sup J< = inf /> 

and 

sup /< = inf J> < sup /< = inf J> . 
Hence, to prove (|3.3f) and (|3.4f) . it suffices to show that 

inf /> < dim c (A) < inf 7> (3.5) 
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inf J> < Dim c (A) < inf J>. (3.6) 

To see that inf J> < dim^(A), let t > dim f (A). Fix t' with t > t' > dim^A). Then, by the 
definition of dim^A), there exist infinitely many n £ Z + such that 

\An{l,...,n}\<n t '. (3.7) 

If n satisfies (|3.7|) and is large enough that n* > n* + 1, fix such that < n < a^+i. Then we 
have 

4+i > n* > n*' + 1 > |An {1,... ,n}| + 1 = fc + 1. 

It follows that there exist infinitely many k such that a\ > k, i.e., that t £ />, whence inf /> < t. 
Since this holds for all t > dim^(A), it follows that inf i> < dim^(^4). 

To see that dim^ (A) < inf I> , let t > inf i> . Then there exist infinitely many n G Z + such that 

> n. For each of these n, we have 

|An{l,... ,a„}| = n<a t n , 
so there exist infinitely many m £ Z + such that 

\An {1, . . . , ,m}\ < m l . 

This implies that 

dim c (A) = liminf ^ " {1 ' ' ' ' ' m}l < t. 

m— >oo log 771 

Since this holds for all t > inf/>, it follows that dim^(yl) < inf/>. This completes the proof that 
(1331) holds. 

The proof that (|3.6j) holds is similar. □ 

4 Main Theorem 

The proof of our main theorem uses the following combinatorial lemma. 

Lemma 4.1. For every n > k > 2 and every partition a = (ao, • • • , dk-i) °f n > there are more than 

k nH k (%)~(k+l)\og k n 

integers m with \af.{m)\ = n and #(i, a^^m)) = for each i E 

Proof. Let n > k > 2, and let a = (ao, • ■ • , a>k-i) be a partition of n. Define the sets 

B = {ueT,l I (V l GW, H ) = a i }, 

C={m£Z + | <7 fc (m) £ 5} . 
Define an equivalence relation ~ on B by 
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Then each ~-equivalence class has at most n elements and contains <7fc(m) for at least one m G C, 
so 

|C|>-W 
n 

Using multinomial coefficients and the well-known estimate e(|)' < t\ < ei(|)*, valid for all t € Z + , 
we have 

fc-i 



n \ n\ i -n I n 

K a ,...,a k -J Ui=o^ e ' 

Since the geometric mean is bounded by the arithmetic mean, 

k 



k-i / 1 k-l 



n 

k^l \k 



k 



i=0 \ i=0 

Putting this all together, we have 

-fc / „ \ Oj -i fc-1 



i c i > e fc-i n fe+i n ( a ) ^ n l ) 

i=0 v J/ i=0 v 7 



whence 



log fc |C| > ^og fc lj(^) a ^ -(fc + l)log fc n 
= riH k (^j - (k + 1) log fe re. 

□ 

We now have all the machinery that we need to prove the main result of this paper. 
Theorem 4.2. (main theorem). Let k > 2. 

1. For every infinite set A C Z + , 

dim FS (CE fe (^)) > dim^A) (4.1) 

and 

Dim FS (CE fe (A)) > Dim c (A). (4.2) 

2. For any four real numbers a, (3, 7, 5 satisfying the inequalities 

7 < S < 1 

VI VI (4.3) 

< a < (3, 

there exists an infinite set A C Z + such that dim^(A) = a, Dim^(yl) = (3, dimps(CEfc(yl)) = 
7, and Dim FS (CE fc (A)) = 5. 
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Proof. To prove part 1, let A = {at < a% < ■ ■ ■ } C Z + be infinite. Fix < s < t < 1, let 

J t = {n G Z + | g4 < n} , 

and let G = (Q,Hk,6,(3,qo) be an FSG. Let n G Z + , and consider the quantity d^(tf n ), where 

Wn = o"fc(ai) • ■■a k (a n ). 
There exist states q\ , . . . , q n G Q such that 

n 

4} K) =II d S i ( 'fc( a i))' 

i=l 

where G 9i = (Q, <5, /3, Let B = jl < i < n (ofc(aj)) > ^ |, and let -B c = {1, . . . , n} — B. 

Then 

4 s) k) = (n4t(^(^))) ( n • (4-4) 

Vies 1 / VieB c 1 / 

By our choice of B, 

n<KK))<^ |B| - n . (4.5) 

By Lemma 13.31 

1^1 < ^f. (4-6) 
where c = size(G) 2 (2size(G)) size ( G ). Since d% ] (u) < k s \ u \ must hold in all cases, it follows that 

\\ .(<r*(oi)) < A; s l B H ,7fc ( a ")l < £; s l B K 1+1 °gfc a ™). (4.7) 

By (g3J), (gini), and (|1~T)) . we have 

lo Sfc < r(l + s + a log fc a n )a* - n, (4.8) 

where r = . If n is sumciently large, and if n + 1 £ Jj, then (|4.8|) implies that 

log fc d^\w n ) < r(l + s + slog fc a n )< - 2(n + 
< r(l + s + s \og k a n )a s n - 2a n 2 +1 

4 

+1, 



s + t 

< r(l + s + s log fc a n )a* - a n 2 - s(l + log fc a nH 

< -s(l + log fc a n+1 ) 

< -s|o- fc (a n+ i)|. 



We have now shown that 
holds for all sumciently large n with n + 1 G Jt 



d { °\w n ) < £r s I^K+i)l (4.9) 



11 



To prove 1)4.10 ■ let s < t < dim^(A). It suffices to show that diniFs(CEfc(A)) > s. Since 
t < dim^(vl), Lemma 13.41 tells us that the set Jt is cofinite. Hence, for every sufficiently long prefix 
w C CEfc(^4), there exist n and u C <7fc(a n+ i) such that u; = and ()4.9|) holds, whence 

rf^H < Ar 5 ^^^"! < 1. 

This shows that G does not s-succeed on CEk(A), whence dimFs(CE/ c ( J 4)) > s. 

To prove (|4.2jl . let s < f < Dim^(^4). It suffices to show that Dim FS (CE fc (A)) > s. Since 
t < Dim^(A), Lemma 13.41 tells us that the set Jt is infinite. For the infinitely many n for which 

n + 1 G J t and ()4.9|) holds, we then have c£q (w n ) < 1. This shows that G does not strongly 
s-succeed on CEk(A), whence Dimps(CEfc(^4)) > s. 

To prove part 2 of the theorem, let a, /3, 7, and 5 be real numbers satisfying 1)4 We will 
explicitly construct an infinite set A C Z + with the indicated dimensions. Intuitively, the values 
of dim^(A) and T)im^(A) will be achieved by controlling the density of A; the upper bounds on 
diniFs(CEfc(^4)) and Dimps(CEfc(^4)) will be achieved by constructing A from integers whose base-/c 
expansions have controlled frequencies of digits (such integers being abundant by Lemma 14.1)) : and 
the lower bounds on dimFs(CEfc(^4)) and DiniFs^E^A)) will be achieved by avoiding use of the 
very few (by Lemma 13.3)1 integers on whose base- A; expansions a finite-state gambler can win. 

We first define some useful probability measures on S^, all expressed as vectors. Let fx = 
(i'-'-'i) ^ A(Xfc) be the uniform probability measure, and let v = (1,0, ...,0) € A(Sfc) be 
the degenerate probability measure that concentrates all probability on 0. Define the function 
g : [0, 1] -» A(Efc) by 

g{r) = r/2 + (1 — r)v. 

Then g defines a line segment from a corner g(0) = V to the centroid g(l) = jl of the simplex 
A(Efc). Also, Ttk o g : [0,1] — > [0,1] is strictly increasing and continuous, with Hfc((?(0)) = and 
?4(5(1)) = 1- Let r 7 = (Wfc o 5 r)- 1 ( 7 ), r 5 = (?4 o g)- 1 (5), 7? = ff(r 7 ), and f=g(r s ), so that 

?4(7?)=7,?4(r) = & 

Then let 7?( fc ), 7f( fc+1 ), 7?( fc+2 ), ... and f( fc ), f( fc+1 ),f( fc+2 ), ... be sequences in Aq(£&) with the 
following properties. 

(i) For each n > k, nn^ and nf'"' are partitions of n, with each n7r| n ^ > \Jn and > y/n 
for n > k 2 . 

(ii) lim 7f( n ) = 7T and lim = f. 

n— >oo n— >oo 

Note that (0) ensures that 

H k {^) > ^-log fe n, H fc (fW) > |^ilog fc n (4.10) 

hold for all n > k 2 . 

For each m£S^ and s 6 [0, 00), let <5 U be the set of all FSGs G with size(G) < log fc log fc |u|, and 

let 

dmL( u ) = max4 (")• 
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Define the sets 



U = \ a > k 



k-l 



V = {a > k 
C =ia> k 



k-l 



k-l 



D = \a>k 



k-l 



tf KWI),) fe(«))>k t («)| W }, 

4^ K(tt)l))) K(a))>l^(«)| fc+2 }' 
(Vi E Z k )#(i,* k (a)) = Ma)|vrf fc(a)l) } , 

(V,GS fe )#(z,a fc (a)) = |a fc (a)|r^ (a)l) }, 



c' = c-u, 

D' = D-V. 



Then, for all n > k, we have 

|C/ =n | = {a E Z± n | ^ W »(a fe (a)) > n fc+2 } , 
so Lemma 1531 tells us that 



\U =n \ < 



k 2H k (t? (") )+nW fe (tt(") ) t 2 / 2 f)* 



n fe+2^W fc (7rC»)) _ j_> 

for all n > k, where t = log/, log/, n. It follows easily from this that 

\U =n \ = o(k nH ^ in) ^ k+1 ^ n ) 
as n — ► 00. By Lemma l4.1| we have 

\C =n \ > A; nW fc( 7?<n) )-( fc + 1 ) 1 °Sfc n . 



(4.11) 



(4.12) 



(By (|4.1()jl . this is positive for all sufficiently large n.) Putting l)4.11jl and ()4. 12|) together with our 
choice of the tt^ gives us 

\C' =n \ > max{l,A;( a - ( 1 )) n } 



as n — > 00. A similar argument shows that 

\D' =n \ >max{l,fcC- W>} 
as n — > 00. It follows that we can fix sets C" C C" and JD" C D' such that 

max{l,it( Q - W' n } < \C'i n \ < k( a+o{1 » n 

and 



max 



(4.13) 
(4.14) 

(4.15) 
(4.16) 



as n — > 00 . 

Now define T 



Z + by the recursion 

T(l) = k,T(l + l) = k T{l \ 
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.* 

so that T(l) is an "exponential tower" k k of height I. For each n > k, let T _1 (n) be the unique 
I such that T(l) <n<T{l + l). Let 

C*= (J Cl n , D* = |J Z^„, 

T _1 (n) even T-!(n) odd 



and let 



A = C*L>D*. 



This is our set A. 

We now note the following. 

1. By (UTS!) . 

|ym{l,...,A: T ( 2Z+1 )- 2 }| 
T(2/)-l T(2J+1)-1 

= ^ ] |^4=n| + ^ 1 \A =n \ 
n=l n=T(2l) 
T(2l)-1 T(2/+l)-l 

< sr k n + fc (Q+o(1)) " 

n=0 n=T(2l) 

< jfcTpO + fc(a+o(l))T(2M-l) 

= fc (a+o(l))T(2i+l) 

as I — > oo, so (|1,4|) tells us that 



V— -»XST 10g t fcT(2i+l)-2 

<limi„f< a + °' 1 » r ' 2 ' + 1) =«. 
~ i^oo T(2l + 1) - 2 



2. By (j4~T5|) . (JUSJ), and the fact that a < /?, 



|o- fe (m)|-l 

|An{l,...,m}| > J] |^=n| 

n=l 
\a k (m)\~l 
> fc («-o(l))n 

n=l 
,a-o(l) 



m 



as m — ► oo, so (|1.4|> tells us that dim^(A) > a. 
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3. By gUSJ, ((UTIiJ), and the fact that a < (3, 

|o- fc (m) 

L4n{l,...,m}| < \ A =n\ 

n=l 

kfc(m)| 

n=l 

= fc(/3+°(l))kfc(m)l 

= m^ +0 « 

as m —* oo, so (jl.3|) tells us that Dim^^) < /3. 

4. By (JOJ) and KWi . 

log k \A =n \ 



Dim^(A) > limsup 



log fe (fc" - 1) 



log.fc^ 1 ))" 
> hm sup — — = (3. 

n^oo log fe (fc n - 1) 

These four things together show that dim^(A) = a and Dim^(A) = (3. 

Our next objective is to prove that diniFs(CEfc(A)) > 7 and DiniFs(CEfc(A)) > 5. For this, let 
G = (Q, Sfc, 5, (3, qo) be an FSG, and let s £ [0, 00). It suffices to prove that 

s < 7 =4> G does not s-succeed on CE) : (A) (4-17) 

and 

s < 5 =4> G does not strongly s-succeed on CE k (A). (4-18) 
Write A = {ai < a 2 <•••}, so that 

CE k (A) = a k (a 1 )a k (a 2 )cr k (a 3 ) 

There is a sequence <?i, <?2, <?3, • • • of states qi 6 Q such that, for any m > and any proper prefix 
u C cr fc (a m+ i), 

(m— 1 \ 
j| dg f kk+i))J 4 S L ( 4 - 19 ) 

where G 9 = (Q, T, k , 5, (3, q). Let c = size(G). Note that, for all q <E Q, size(Gq) = c, so 

tfc c 

a > k => c< log fc log fc log fc a < log fc log fc | a k (a) | 
=> G g e ^<j fc (a)- 

Since C* !~l J7 = 0, it follows that, for all q G Q, 

k kkC <aeC* =n ^ d^f (n)) \a k (a)) < n k ^. 



15 



Using the identity = k^- s '^d^'\x) and the facts that W fc (v? (n) ) = 7+o(l) and n fc+2 = 

as n — > oo, we then have, for all q £ Q, 

«eC!„4 dg^fffcCo)) < fc^+oW)" (4.20) 

as n — > oo. A similar argument shows that, for all q £ Q, 

a G D* =n =► d { a ] q (a k (a)) < (4.21) 

as n — > oo. 

To verify (|4~T7)l . assume that s < 7. Then, since 7 < <5, (j]OU}) and (|1~2T|) tell us that 

(<7fc(Oi+i)) < jfe(— T+«(l))k*(«*fi)l 

as z — > 00. It follows by 1)4.19)1 that, for any prefix w C CEjt(A), if we write tu = <7fc(ai) • • • ak(a m )u, 
where u C <jfc(a m +i), then |u| = o(|w|) as \w\ — ► 00, so 

d$(w) < ( JJjfe('-'y+o(i))kfc(^i)lJ fc 'M 

_ ^.(a-7+o(l))(|tu|-|u|)+«|u| 
_ £.(s-7+°(l))M 

as |w| — ► 00. Since s < 7, it follows that 

limsupdg ) (CE fc ( J 4)[0..n - 1]) = 0, 

n^oo 

affirming (|4.17l) . 

To verify 1)4. 18|) . assume that s < 5. For each I G Z + , let 

«j = o-fe(ai()o-fc(ai(+i) • • •crj fc (a i j +1 _i), 

where i\ is the least i such that |ofc(aj)| = T(l), and let 

wi = viv 2 ■ --vi-i, 

noting that each wi C CEfc(yl). Then = o(|i>/|) as I — ► 00, so 



*=«2i-l 
»21~1 



< /j*|u>2l-ll TJ fc( 
i=«2!-l 

_ ^.s|itJ 2 i_i | + (s— <5+o(l))|i) 2 i_i I 

_ ^(s-<5+o(l))]U2J_l| 



S-(5+o(l))|<7 fc (Oi)| 
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as I — > oo. Since s < 5, this affirms (j4.18l) and concludes the proof that dimFs(CEfc(A)) > 7 and 
Dim FS (CE fe (A)) > 5. 

All that remains is to prove that dimFs(CE/ c (^4)) < 7 and Dimps(CEfc(^4)) < 5. For each 
rational r G Q Pi [0, 1], let G r be the 1-state FSG whose bets are given by g(r), where g : [0, 1] — ► 
A(Sfc) is the function defined earlier in this proof. That is, for all s G [0, oo), w G E£, and a G E^, 
we have 

d%l{wa) = k s g(r){a)d%](w). 
If we write 9 w (a) = for all ui G E^ and a G E&, then this implies that, for all w G EjJ", 



rfgH=^ H II 5(r)(a)#^, 

a6S fc 



whence 




We have thus shown that 



/ \ " ^ io gfc 7 w s 

kl (s-H k (O w )-V k (e w || ff (r))). 



_ ] c (s-H k {6 w )-'D k (6 w \\g(r)))\v3\ {A.22) 



holds for all r G Q n [0, 1], s G [0, 00), and u> G E^ . 

We now note a useful property of the function g. If we fix r G (0, 1], then 

±[H k (g{x))+V k (g{x) \\ g(r))\ = ±=± log, - + - ~ kr > 0, 
ax k r 

so 

9 < r => HfcG/(g)) + 2> fe (<7(<7) II 5(0) < ?4( 5 (r)). (4.23) 

For each n G Z + , let = #„, n , where w n = CF lk (A)[0..n — 1] is the string consisting of the first 
n symbols in CEk(A). Then 9^, 9^, ■ ■ • is an infinite sequence of probability vectors in the simplex 
A(Efc). For every n such that T _1 (n) is even, A =n = CZ. n consists entirely of integers a for which 
®<r k (a) = and for every n such that T~ l {n) is odd, ^4 =n = D* =n consists entirely of integers a 
for which 9 ak i a ) = T^ n \ Since tt^ converges to <?(r 7 ), T^ n ' converges to g(r$), and G grows very 
rapidly, it follows easily that the set of limit points of the sequence 9^, ■ ■ • is precisely the closed 
line segment g{[r 1 ,rs\) (which is a point if 7 = 5). 

To see that dimFs(CEfc(^4)) < 7, assume that 7 < s < 1. It suffices to show that dimps(CEfc(^4)) 
< s. For this, fix r G Q D (r 7 , (TCj. o g)^ 1 (s)). Since <?(r 7 ) is a limit point of 9f, Q^i • ■ • > there is a 
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sequence n\ < ri2 < • • • of positive integers such that Hindoo n . — g r (r 7 ). By (HUH), flP3D , and 
the continuity of TC k (x) + T> k {x \\ g(r)) as a function of x, we then have 



<f£( Wnt ) = k^-Huie^-v^Ur)))^ 

> jfe(»-«fc(fl(r))-o(l))rn 



as i — > 00. Since Hk(g(r)) < s, it follows that G r s-succeeds on CEfc(.A), whence diniFs(CEfc(^4)) < 
s. 

To see that DiniFs(CEfc(>l)) < 6, assume that S < s < 1. It suffices to show that DiniFs(CEfc(A)) 
< s. For this, fix r G Q n (rg, {H k g) (s)). For each n G Z + , let <7(<? n ) be the point on the line 
segment g{[r 1 ,r^\) that is closest to 0„. Since <?([r 7 , r^]) contains every limit point of Of, Of, ■ ■ ■ , 
A(Sfc) is compact, and TL k {x) + T) k {x \\ g(f)) is a continuous function of x, we have 



as n -> 00. By (fOSf) . (Ii^3l . and @23>, 

> jfe(»-W|b(9(r))-o(l))n 

as n — > 00. Since TL k (g{r)) < s, it follows that G r strongly s-succeeds on CEfc(A), whence 
Dim FS (CE fc (4)) < s. □ 

Finally, we note that the Copeland-Erdos theorem is a special case of our main theorem. 

Corollary 4.3. (Copeland and Erdos JS]). Let k > 2 and A C Z + . //, /or a// a < 1, /or a// 

sufficiently large n G Z + , \A f] {1, . . . , n}\ > n a , then the sequence CEfc(^4) is normal over the 
alphabet In particular, the sequence CE^ (PRIMES) is normal over the alphabet 

Proof. The hypothesis implies that dim^(A) > a for all a < 1, i.e., that dim^(^4) = 1. By 
Theorem 14. 2| this implies that dimFs(CEfc(74)) = 1, which is equivalent |16| 15] to the normality of 
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